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SYSTEMATIC AND RANDOM SAMPLING FOR ESTIMATING 
EGG PRODUCTION IN POULTRY* 


A. W. Norpskoe ANp S. LEE Crump 


Iowa State Colleget 


INTRODUCTION 


1 ee PROBLEM OF determining the ideal period for incomplete trap- 
nesting to estimate egg production has been investigated during 
the last thirty years by several workers. Thompson (1933), Olson (1939) 
and Hays (1946) have given good reviews of the subject. 

If the heritability of egg production were high, mass selection of 
breeding stock would be an efficient method of attaining genetic im- 
provement (Lush, 1946) and the accuracy of the production records for 
individual hens would be of primary importance. Daily trapnesting 
would provide accurate individual records. That mass selection is not 
an efficient method in breeding for improved egg production has long 
been recognized (Gowell, 1903). Lerner and Hazel (1947) estimated 
that the heritability of egg production based on individual hens was 
about 5 per cent in the flock that they studied. 

Greater breeding progress is possible by selection of sire-progeny 
and family groups than by simple mass selection. Thus, the accuracy 
of the average record for the group is most important, while that for 
individual records is secondary. Errors in individual records are only 
one source of error in the average record for the group. The other im- 
portant source of error is the variation caused by true differences in 
productive ability among the individuals of a family or sire-progeny 
group. 

In the development and testing of inbred lines and crosses among 
them, it is again the accuracy of the record for the group which is of 
primary importance. Since trapnesting is costly, there is need for an 
accurate assessment of the relative sizes of the errors in group records 
arising from various sources. It is the purpose of this paper to report 
the relative sizes of errors arising from incomplete trapnesting of 
individual hens, and from variation among the hens of a group, under 
several schemes of incomplete trapnesting. 


*Journal Paper No. J-1535 of the Iowa Agricultural Experiment Station, Ames, Iowa. Project 
No. 54. 
tJoint contribution from the Poultry Section and the Statistical Laboratory. 
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THE DATA 


The data include the complete trapnest records for 20 top-cross White 
Leghorns (inbred males X non-inbred females), selected at random, 
from each of five sire-progeny groups. A restriction imposed on the 
data was that only those hens finishing their first laying year were 
selected (1940-41). All the hens were housed together in two adjoining 
pens, each containing about 150 birds. The average production for the 
100 hens used was 173 eggs per hen per year. 

Because of the seasonal fluctuation in egg production, any system of 
incomplete trapnesting to estimate total annual egg production should 
extend over the whole year. One obvious way of guaranteeing this is to 
consider only systems which include at least one day in each month of 
the year. Accordingly, the complete record for each hen was divided into 
28 basic sampling units formed as follows: the first basic sampling unit 
includes the record for the first of each month, the second includes the 
record for the second of each month, etc. For convenience, only the 
first 28 days of each month were used. The results which would have 
been obtained using the full monthly record for each hen would not 
differ materially from those reported here. It should be noted that each 
of the basic sampling units is itself a systemati¢ sample of the days of 
the year as defined by Madow (1946). Now, if the basic sampling units 
for each hen are denoted by 


,%2,°** » Log 


the systems of incomplete trapnesting considered in this paper may be 
defined as follows: 


A Interval-day trapnesting. The trapnesting days are spaced at 
regular intervals within the months. There are d, = 28/a possible 
interval-day samples of size a (i.e. which include a@ of the basic samp- 
ling units). Let S,(a@) denote the kth interval-day sample of size 
a,k = 1,2,--- ,d,.. Then S,(a@) consists of the following basic 
sampling units: 


Ley » Uk+2da » Uk+(a-1)da 


To obtain an interval-day sample of size a, one of the samples 
S,(a), S.(a), --- , Sa,(@) is selected at random. 

To illustrate, when trapnesting is to be carried out on a = 4 days 
per month there are d, = 28/4 = 7 possible interval-day samples. 
Two of the possible samples are made up as follows: 
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S2(4) Xo, Lie Loz 


S5(4) : » Lig Lae 


B Consecutive-day trapnesting. The trapnesting days within each 
month are consecutive. As in the preceding case there are d, = 28/a 
possible consecutive-day samples of size a. Let Si(a) denote the 
kth consecutive-day sample of size a, k = 1,2, --- ,d.. Then 
Si(a) consists of the following basic sampling units: 


» Va(k-1)+2 » Vatk-1)+a 


To obtain a consecutive-day sample of size a, one of the samples 
S{(a), Si(a), --- , Si(a) is selected at random. 

Using again the previous illustration, there are 7 possible consecu- 
tive-day samples. Two of the samples are made up as follows: 


S3(4) > Xo» Lio 


S4(4) Los Loe Lo7 Leg 


C Random-day trapnesting. The trapnesting days within each month 
are randomly distributed. To obtain a random-day sample of size 
a, a of the samples 


S,(1), S.(1), S2s(1) 


are selected at random. The samples S,(1), S.(1) --- , S2s(1) are 
of course just the basic sampling units. 

When a = 1 the three systems are identical. Since each basic 
sampling unit includes one day in each month the total number of 
trapnest days in any sample of size a is 12a. In the present discus- 
sion a takes the values 1, 2, 4,7, and 14. These values are conven- 
ient since they are the factors of 28. 


THEORY 


The following discussion will refer to interval-day trapnesting. With 
appropriate changes in notation it applies also to consecutive-day 
trapnesting. 

Let y:;.(a@) denote the egg production in the sample, S,(a), (adjusted 
to an estimate of the total yearly production) for the jth hen in the 7th 
group. Then the model expressing the egg production in this sample is 
given by the following equations: 
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=u thi + sie) + rule) + 
i= 2, ae 


where: 


2. 
3. 


5. 


wu = the true average yearly production per hen over all p groups. 
g; = the deviation of the true mean of the 7th group from u. 

h,; = the deviation of the production of the jth hen in the ith 
group from u + g,. The A,; are a random sample from an infinite 
population with mean zero and variance go}, . 


. &(a@) = the deviation of the production in sample S,(a), aver- 


aged over all pq hens, from yu. These deviations comprise a 
finite population with mean zero and variance, o.(a) = 1/d, 


> sia). 
k 


r..(a) = the deviation of the production in sample S,(q), 
averaged over the g hens in the 7th group, from uw 4- g; + 8,(a). 
For each 7 these deviations comprise a finite population with 
mean zero and variance o7(a) = 1/d, >. ru(a) 

k 


. €;.(@) = the deviation of the production in S,(a) for the jth 


hen in the 7th group from » + g; + h,; . These deviations are a 
random sample from an infinite population with mean zero and 
variance o, subject to the restrictions >> e;.(a) = 0. 


Now assume that a single interval-day sample of size a is selected 
for g hens in each of two groups. Let S,(a,) be the sample selected. Then 
the difference in yearly production between the two groups, g,; — ge , is 
estimated by 


where 


= Yii(@), h = 1,2. 


The sampling variance of D(a) is 


V[D(@)} = 2! [or + o¢(a)] + 


226 
j=1,2,---,@ 
k=1,2,-+-,d, 
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If two interval-day samples of size a are drawn independently, the 
first being used for the g hens in the first group, and the second for the q 
hens in the second group, then g, — g, is estimated by 


where the kth and lth samples are the ones selected for the first and 
second groups respectively. In this case the sampling variance is 


V[D,@)] = 2! + + + ci. 


If a single random-day sample of size a were selected for g hens in 
each of two groups, letting S,,(1), S,,(1), --- , Si,(@) denote the sample 
selected, then the group difference is estimated by 


D'"(a) = — 
where 


ila) = Yrikm(1), h=1,2. 


= 
qa m=1 


The sampling variance of D’’(a) is 


1 28 —a 28 — a 
” Ma = 2 2 = 2 
V[D'"(@)] = of E | 
If a different random-day sample is independently drawn for each group, 
9: — Jo is estimated by 


with sampling variance 
‘ 1 28 — 23 — 
= of! E | + [o2(1) + 


where S,,(1), S;,(1), --- , S,,(1) is the sample drawn for the second 
group. 

It will be convenient to compare the three trapnest sampling schemes 
in terms of the sampling variances of estimates of group differences. In 
order to estimate these sampling variances it is necessary to estimate 
the variance components o; , and o;(a) for interval-day and 
consecutive-day samples when a = 1, 2,4,7,14. (The variance compo- 
nents and their estimates will be “‘primed”’ for consecutive-day samples.) 
These estimates may be computed easily from an analysis of sums of 
squares. Table 1 shows the appropriate analysis for any value of a for 
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interval-day samples. The analysis for consecutive-day samples is 
analogous, and for a = 1 is, of course, identical. The last column of 
Table 1 contains the expectations of the sums of squares. These expecta- 
tions may be verified by reference to Daniels (1939) and Crump (1946). 

Estimates of the variance components are obtained by setting the 
sums of squares of a given analysis equal to their expectations with s” 
substituted for o”, and solving the resulting set of linear equations for 
8, , 8-(@), and s;(a). The estimates of the o”’s, then, are the corre- 
sponding s”’s. 


RESULTS 


In order to clarify the method of estimating the variance components 
outlined in the preceding section, the numerical analysis of the sums of 
squares for a = 1 is shown in Table 2. 


TABLE 2 
THE ANALYSIS OF THE SUMS OF SQUARES FORa = 1 

Source of Degrees of | Sums of Expectations of the 

Variation Freedom | Squares Sums of Squares 
Groups 4 543,419 4+280%, + 20:28 2 g?; 
Samples 27 130,027 |2802,(1) +20-280?,(1) +5-20-280?,(1) 
Hens within groups 95 3,642,188 5-19-2807, 
Samples X Groups 108 224,387 4-2807.(1) + 4-20-2807,(1) 
Samples X Hens 

within groups 2565 5,245,953 5-19-280?,(1) 


The sums of squares and their expectations constitute a set of linear 
equations for éstimating the variance components. Solving these equa- 
tions gives the following estimates: 


s, = 1369.24 
s3(1) = 1972.16 
s:(1) = 26.40 
= 1.56 


The component arising from variation among hens, a; , is clearly 
independent of the sampling scheme and of the size of the sample. 
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Hence, the estimate s; will remain the same in all analyses of the sums of 
squares, The variation arising from the difference in the behavior of a 
given sampling scheme over the groups is measured by o7(a)(o/"(a) in 
the case of consecutive-day sampling). On intuitive grounds it is not 
expected that this variation wil] be large. For a = 1 s?(1) = 1.56, is 
comparatively very small. Kor other values of a both s}(a) and s/ (a) 
are also small, and in the remainder of the discussion o7(a) and o/?(a) 
are neglected 

Table 3 gives the estimates of the variance components, s2(a), s:(a), 

s,’(a) and s/"(a). 


TABLE 3 
ESTIMATES OF THE VARIANCE COMPONENTS 
Sample Size Interval-day Samples Consecutive-day Samples 
a s*.(a) 8%,(a) s’,(a) s’,(a) 
2 925.7 6.1 862.1 23.6 
4 430.8 0.0 333.7 19.6 
7 244.0 0.0 195.5 13.3 
14 98.5 0.0 74.0 12.4 


It is to be noted that for all values of a s3(a) > si?(a) and si(a) < 
s’x(a). Both of these inequalities indicate that the differences among 
interval-day samples are less consistent than for those taken on consecu- 
tive days. This fact is not entirely unexpected, and is reflective of cyclic 
changes in egg production within months. Thus, its influence will be 
more apparent when samples are taken on consecutive days. 

Table 4 shows the sampling errors of the difference between average 
yearly production for two groups of 20 birds each sampled on the same 
days, and sampled on different days. 

The most notable feature about Table 4 is the apparent uniformity 
within the lines of the table. The differences between the 3 sampling 
schemes are small when groups of size 20 are to be compared. It is 
interesting to observe that for groups trapnested on the same days the 
consecutive-day scheme has the lowest sampling error for all values of 
a, but when groups are trapnested on different days it has the highest 
for all values of a. This results from the two inequalities mentioned in 
connection with Table 3. There is only a small loss in accuracy in 
trapnesting 14 days per month under any of the sampling schemes 
compared with trapnesting 28 days per month. The former shows an 
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TABLE 4 


SAMPLING ERRORS OF THE DIFFERENCE BETWEEN THE AVERAGE YEARLY 
PRODUCTION OF TWO GROUPS OF 20 BIRDS EACH FOR DIFFERENT SAMPLING 
SCHEMES 
(Resutts ARE IN NUMBER OF Eacs PER HEN PER YEAR) 


Sample size Groups trapnested on same days Groups trapnested on different days 
Interval Consecutive Random Interval Consecutive Random 

a days days days days days days 

1 18.3 ad 18.3 19.0 vr 9.0 

2 15.1 14.9 15.2 15.3 15.7 15.6 

4 13.4 13.0 13.4 13.4 13.8 13.7 

7 12.7 12.5 12.6 12.7 13.0 12.7 

14 2.1 12.0 12.0 12.1 12.5 12.1 

28 


approximate sampling error of 12.1 eggs compared with 11.7 eggs for the 
latter. Also, there is only a difference of about 1-1/2 eggs in sampling 
error between trapnesting 14 days and 4 days per month. It is evident 
that the error resulting from incomplete trapnesting is of minor im- 
portance when groups as large as 20 are being compared. The relative 
importance of group size (number of birds) and the completeness of 
trapnesting in reducing sampling variance will be considered in the next 
section. 


PRACTICAL APPLICATION 


Hays (1946) states that daily trapnesting adds about one dollar per 
year to the cost of keeping a hen. The trend in present poultry breeding 
operations to minimize the importance of high individual hen records 
and correspondingly to lay greater stress on the average production of 
families of full sibs or half-sibs makes it desirable to consider jointly the 
error resulting from incomplete trapnesting and limited group size. 
Both affect the accuracy of a family average. With a knowledge of the 
error expected from these two sources it is possible to estimate the level 
of accuracy for any combination of number of days of trapnesting and 
flock size. 

Figure 1 snows graphically the percent increase in group or flock size 
that will just offset the errors resulting from incomplete sampling when 
trapnesting is conducted on the same days for all hens in each group. 
The required increase in flock size is proportional to 
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FIGURE 1 
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INTERVAL-DAY TRAPNESTING 


PER CENT INCREASE IN GROUP SIZE 


CONSECUTIVE-DAY TRAPNESTING 


+ 


2 4 7 14 28 
DAYS PER MONTH OF TRAPNESTING 


The per cent increase in number of birds per group that will offset the loss in accuracy due to incomplete 
trapnesting. 


for any given system of trapnesting. From the graph it is seen that the 
accuracy lost by trapnesting 14 days per month compared with complete 
trapnesting could be recovered by increasing group size by only 7 per- 
cent. Increasing group size by 25 percent would make it necessary to 
trapnest only 4 days per month, or 1/7 as much. Thus, groups such as 
full-sibs averaging 8 in number and trapnested daily would be no more 
reliable than groups of 10 trapnested only 4 days per month. For testing 
large groups such as the progeny of a sire where groups may contain as 
many as 30 birds, the results show that 32, 35, 39 and 50 birds trapnested 
14, 7, 4 and 2 days per month, respectively, would give production 
records as accurate as those obtained from 30 birds trapnested daily. 
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SUMMARY 


One hundred first year egg production records were divided into 
partial records of 1, 2, 4, 7 and 14 days per month to correspond to 
different degrees of incomplete trapnesting. Three methods of incom- 
plete trapnesting (sampling) were considered: (1) Interval-day—each 
trapnest day spaced at regular intervals throughout the month, (2) con- 
secutive-day—the trapnest days were taken consecutively, and (3) 
random-day trapnesting. For each size of sample and method of samp- 
ling the total variance in egg production was separated into three com- 
ponents: hens, sampling days, and remainder. From these, sampling 
errors were estimated for the different methods and size of samples. 

The differences in accuracy among the three methods are small. 
Interval-day trapnesting may be slightly more accurate when groups of 
birds to be compared are trapnested on different days, but when trap- 
nesting is conducted on the same days consecutive-day sampling is 
slightly more accurate. The ‘standard error of the difference of two 
groups of 20 birds is about 11.7 eggs per hen per year under complete 
trapnesting. The standard error increases by about .4 of an egg when 
trapnesting is reduced to a half-time basis and by 1.5 eggs when the 
amount of trapnesting is reduced to 1/7 of the time. The accuracy lost 
by trapnesting 14 days per month could be recovered by increasing 
group size only 7 per cent, while that for trapnesting only 4 days per 
month could be recovered by increasing group size 25 per cent. 
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SAMPLING ALYCE CLOVER FOR CHEMICAL ANALYSES’ 


J. A. RIGNEy AND R. E. 


Alyce clover (Alysicarpus vaginalis (L.) D.C.) is a summer 
annual legume used for hay and pasture in Florida. Since the mois- 
ture and fertility requirements of the plant were not known, several 
tests were initiated to study its adaptation. The effects of various 
fertilizer treatments on yield and herbage composition have 
already been published [1]. 

During the progress of these experiments, the problem arose 
as to optimum techniques for sampling the plots for chemical 
evaluation. In the absence of sufficient information on the problem, 
sampling data were collected on one experiment and these data 
form the subject of this paper. 


METHODS 


per EXPERIMENT was located near Gainesville, Florida, on a reason- 
ably uniform field of Norfolk fine sand. Ten fertilizer treatments 
were randomized in each of three blocks. The plots were 10 x 30 feet and 
were arranged within the blocks so as to keep the between plot varia- 
bility at a minimum. 

The fertilizer materials were broadcast uniformly and then disked 
once to a depth of about four inches by running a nine-foot disk length- 
wise through each plot. When cut for hay, the plants varied in height 
from 24 to 36 inches. Samples for chemical analyses were taken from 
the plots before mowing when about one-fourth of the flowers were open. 

Sampling data were taken on only five of the treatments in the three 
blocks. Two people independently obtained herbage samples from each 
of the fifteen plots. Each plot was sampled by following a zig-zag path 
lengthwise through the plot. A “grab sample’ was taken at the end of 
alternate paces, making a total of 12 to 15 “grabs” per plot. The plants 
in each “grab” were cut approximately 3 inches above ground with a 
hand sickle. The green weight of the plant material per plot averaged 
5.5 pounds. 

The samples were dried in an oven at 70° C. and ground. After 
thorough mixing, a subsample of the material was placed in a pint jar. 

The chemical analyses were made in the Agronomy Department 
laboratory at the University of Florida. Two sub-samples were drawn 


1Joint contribution from tne Institute of Statistics of The University of North Carolina, Raleigh, 
the Florida Agricultural Experiment Station, Journal Paper No. 298. 

2Plant Science Statistician, Institute of Statistics and Professor of Agronomy, Cornell University 
formerly Agronomist with the Florida Agricultural Experiment Station. 
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from each jar for analysis. Hence there were four separate ashings and 
analyses from each plot, i.e., two sub-samples from each of two plot 
samples. Standard chemical analytical techniques were used to de- 
termine the percentage P, K, Ca and Mg in the samples. 


RESULTS AND DISCUSSION 


The analyses of variance for the four constituents are shown in the 
upper part of table I. Only those sources of variation which measure 


TABLE I 
ANALYSES OF VARIANCE OF Data ON PHOSPHORUS, POTASSIUM, 
CALCIUM AND MAGNESIUM CONTENTS OF ALYCE CLOVER 


Mean Squares 

Source of d.f. Composition of 

variation Pr K Ca Mg of M.S. 
Replication 2 |.004,399) .039 , 655} .006 , 625) .009 ,592 
Treatments 4 |.000, 429). 103,821! .025 ,075/.010, 489 
Exp. error 8 |.000,866}. 107 , 729) .033 ,069) .003 ,675| Va + 2V, + 4V, 
Samples in plots 15 |.000,239).041 Va + 2V, 
Determinations 30 |.000,007| .000 , 306) .000 169) .000 ,328) Vu 

EsTIMATES OF VARIANCE COMPONENTS 
| 
Plots .000 , 157) .016 , 562) .003 , 560) .000 ,015 
Samples .000 , 116} .020 , 588) .009 , 329) .001 , 644 
Determinations 
(Va) .000 , 007) .000 , 306} .000 , 169) .000 , 328 


sampling variances are of interest in this paper, but mean squares for 
replication and treatments are included to indicate the complete analysis. 
The last column of the upper portion indicates the composition of the 
mean squares that are of interest. The estimates of the variance com- 
ponents given in the lower part of the tables were computed from the 
respective mean squares as indicated by their algebraic compositions. 

A comparison of the estimated variances indicates that the laboratory 
technique of sub-sampling the ground plant material and making the 
actual determinations was satisfactory, since the standard error for 
determinations was less than 5% in all cases. The variation between 
successive samples from the same plot was relatively large and even 
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exceeded plot-to-plot variation for K, Ca and Mg. Plot variance for 
Mg is unusually low compared to that found in other studies. 

The variance of a treatment mean may be conveniently indicated as 


Vo, V., Ve 


where V, , V, and V, are the estimated true variances due to plots, 
samples in plots and determinations, and p, s and d are the number of 
plots, samples and determinations per treatment, respectively. It has 
been shown [2] that this form is useful in evaluating composites of several 
samples from one plot or composites of several plots. Table II gives 
the variance of a treatment mean for different numbers of plots and 


TABLE II 


Tue Errect or ALTERING THE SAMPLING SCHEME ON 
THE AccURACY AND Cost oF TREATMENT MEANS 


Number | Number | Number | Variance | Confidence} Relative | Cost per 
plots per | samples detns. of treat. | interval | cost per | unit In- 
treatment | per plot per plot | mean (V;)| (¢,.05)Sz) | treatment | formation 
(X 100) 
Phosphorus 

3 1 1 .000 ,093 .022 254 2.36 

3 1 2 .000 ,092 .022 344 3.16 

3 1 3 .000 ,092 .022 434 3.99 

3 2 1! .000 ,073 .020 275 2.00 

3 3 be .000 ,067 .019 296 1.98 

6 1 1 .000 ,046 .016 485 2.23 

6 2 yt .000 ,037 .014 547 2.02 

3? 2 4 .000 ,072 .020 545 3.92 


1Single determinations on composites of two or three samples. 
2This is the scheme actually used in the study. 


samples and determinations per plot for only phosphorus since it 
will illustrate the methods involved. The first three lines of the table 
show that increasing the number of determinations has little effect in 
reducing the sample variance. This is equivalent to saying that further 
refinement of the laboratory technique would be of little value under 
these conditions. However, taking more samples per plot is quite effec- 
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tive in improving the accuracy (lines 3 and 4, table II). For example, 
the variance of the mean was reduced from .000093 to .000073 by taking 
two samples per plot and making a single determination on the compos- 
ite. A third sample per plot was not as effective as the second. It 
should be emphasized here that taking an additional sample implies 
making a new randomly determined trip through the plot and collecting 
“12-15 grabs’. It is not possible to estimate from these data the 
effects of increasing the number of “grabs” in the initial trip. It should 
also be pointed out that these and following remarks assume that errors 
of subsampling the ground material will be unaffected by the number of 
samples composited. This seems to be a reasonable assumption at least 
for a limited range of samples if the material is thoroughly mixed. 

Another method of reducing the sample variance would be to increase 
the number of replications of the entire experiment. Doubling the num- 
ber of replications but taking the same number of samples and determina- 
tions per plot would of course double the accuracy of a treatment mean. 
However, increasing the number of plots is much more expensive than 
taking more samples per plot. Therefore, where the individual plots 
are poorly sampled, this may not prove to be the most efficient way to 
improve the technique, as shown below. 

The fifth column of Table II shows the confidence interval for each 
sampling procedure. This value is computed as t,.o5)(V;)'” and is 
interpreted as the interval on either side of the observed mean having a 
95% probability of including the true value for that treatment. Thus, 
under these conditions, by making a single determination on a composite 
of two samples in each of three replications, one might confidently (95%) 
expect to be within 0.020 of the true phosphorus percentage for a given 
treatment. 

Final decision as to the best method of reducing the variance of a 
treatment mean must include some notion of cost. Such an approach 
has been widely used by agricultural economists, but few attempts have 
been reported by agronomists. Several forage crops investigators sup- 
plied the writers with estimates of the relative costs of the three main 
procedures involved in this study. The estimates were given as the cost 
of the initial unit (plot, sample or determinations) and the relative cost 
of each additional unit. Average estimates were as follows: 


Relative Cost of 
Initial Additional 
10 7 
Determinations ...... 30 30 
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From such estimates, the relative cost of any sampling scheme may 
be evaluated by setting cost = 60 + 40(p — 1) + 10+ 7(s — 1) + 30d. 
Costs so derived are given in column 6 of table IT. 

The relative amount of information that is supplied by a particular 
mean is inversely proportional to its variance. Therefore cost/(1/V;) 
gives the relative cost per unit information. The sampling scheme that 
gives the lowest cost per unit information may be considered the most 
efficient. Some cost comparisons are shown in the last column of table 
II. In general these data indicate that increasing the. number of de- 
terminations per plot is expensive when the sampling technique and the 
plot-to-plot variations are so large. Improving the sampling of the 
individual plots was desirable up to two or three samples per plot. 

If the above cost function is set up for a fixed degree of accuracy 
(V,) it can be minimized to give the optimum proportion of plots to 
samples to determinations. Table III shows these ratios derived for the 
four constituents studied. 

TABLE III 


Optimum Ratio or NuMBER OF PLots, SAMPLES AND 
DETERMINATIONS PER TREATMENT 


K Ca Mg 
Plots 4 6 4 1 
Samples 8 16 15 26 
Determinations 1 1 1 6 


For phosphorus, a single determination on the composite of 8 samples, 
two from each of four plots, would result in a minimum cost per unit 
information. The other constituents give similar ratios except Mg 
which had an unusually low plot variance. 

In practice, it would not be desirable to composite all replications as 
indicated in table III since no estimate of error would be available. 
However, if a large number of treatments is being tested in as many as 
four replications, it might be desirable to composite pairs of replications. 
Since it would be desirable to have a fair estimate of experimental error, 
a reasonable criterion for deciding how much compositing of replications 
could be done would be that the degrees of freedom for error did not 
fall below 15. The values for cost per unit information in table II indi- 
cated that the number of plots and samples per plot could be varied 
considerably without deviating seriously from the lowest cost. However, 
the number of determinations must be limited more carefully due to the 
high cost and low variation.associated with the chemical analyses. 
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There are some apparent inconsistencies between the ratios in table 
III and those of table II. For example, in table II three samples per 
plot was slightly more efficient than two for phosphorus but the above 
ratios indicate that two is optimum. This disparity arises from the 
fact that more than the optimum number of determinations were used 
in table II. If the number of determinations is set equal to the number 
of plots, the optimum number of samples per plot becomes 2.6. 

Potassium was more variable than the other constituents although 
there was no apparent reason for this. A single determination on a 
composite of two samples from each of three plots would give a confidence 
interval that was 25% of the over-all mean for K, compared to 9%, 
14% and 14% for P, Ca and Mg, respectively. If it were desirable to 
reduce the confidence interval. of K to 10% of the general mean, V, 
would need to be reduced to .00139. According to the ratios of table III 
the optimum technique would require 19 plots, 50 samples and 3 de- 
terminations. While this number of plots and samples seems absurdly 
high, it illustrates the difficulty of obtaining a high degree of accuracy 
with such variable material. 


SUMMARY AND CONCLUSION 


Duplicate determinations on each of two field samples from three 
replications of five fertilizer treatments on Alyce Clover provided data 
for estimating variances due to plots, samples and chemical determina- 
tions. These estimates were obtained for percentage P, K, Ca and Mg 
in the clover hay. The accuracy of treatment means involving different 
numbers of plots, samples and determinations was examined. Relative 
costs for the three phases of the procedure were estimated and the cost 
per unit information computed for the various schemes under study. 
The optimum ratio of plots: samples: determinations was calculated for 
a constant variance of the mean. 

In general, the relatively high cost and low variance of the laboratory 
determinations require that this part of the technique be reduced to a 
minimum. The optimum ratio of total samples to total determinations 
per treatment varied from 4 for Mg to 16 for K. Except for the unusu- 
ally low plot-to-plot variance of Mg, the optimum number of. samples 
per plot ranged from 2 for P to 4 for Ca. 
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THE ANALYSIS OF COVARIANCE AND 
NON-ORTHOGONAL COMPARISONS 


M. H. QuENOUILLE 
Marischal College, Aberdeen, Scotland 


INTRODUCTION 


F Yates [1] has defined orthogonality as that property of a design 
* which ensures that the different classes of effects to which the 
experimental material is subject shall be capable of direct and separate 
estimation without any entanglement. Obviously orthogonality is a 
property to be desired in any design, but unfortunately the design of 
experiment cannot always be determined prior to the commencement of 
an experiment, while experiments which are planned as orthogonal are 
frequently ‘confounded’ by extraneous causes. Yates, for example, 
considered an experiment in the growth of chickens on three different 
diets. Because of the difficulty of determining the sex at hatching, the 
proportions of cockerels in the three groups will usually vary, and since 
cockerels grow faster than pullets, the effect of sex must be taken into 
account if the comparisons between diets are to be accurate. 

Commonly the method of least squares is employed in the analysis 
of non-orthogonal comparisons, but the labour involved in solving 
several simultaneous equations and estimating, in R. A. Fisher’s nota- 
tion [2] the elements c,; of the inverse matrix is frequently large. The 
purpose of this note is to show how when the deviations from ortho- 
gonality are small, it is often possible to carry out this calculation using 
the analysis of covariance thus reducing the algebraic procedure of 
solving the normal least squares equations to the arithmetical procedure 
of the analysis of covariance. The normal least squares equations give 
rise to direct solutions when all comparisons are orthogonal and while a 
slight deviation from orthogonality will frequently yield a set of appa- 
rently difficult equations, the analysis of covariance reduces these 
equations with the minimum of calculation. 
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THE BASIS OF THE METHOD 


The method is in fact an extension of that suggested by M. 8S. Bart- 
lett [3] for the simplest case of non-orthogonality, namely one missing 
observation. Bartlett suggested that the analysis could be carried out 
in the normal manner with the missing observation replaced by zero 
(or any other convenient value), if a covariance analysis was simultane- 
ously carried out on a second set of observations in which the value 
corresponding to the missing observation might be taken as one, and 
all other observations as zero. This is quite a neat method of estimating 
and adjusting for the missing observation, and as Bartlett pointed out, 
it can be used to compensate for several missing values, although the 
task becomes more arduous as the number of missing values is increased 
and the degree of orthogonality is decreased. However this same method 
is useful for other slight deviations from orthogonality, as is demon- 
strated by the following examples. 


METHOD OF ANALYSIS 


Consider the experiment on the growth of chickens given by Yates. 
The total bird weights are given in Table 1, and the number of birds in 
each group is indicated in brackets. 


TABLE 1 
Tota Brrp WEIGHTS 


Treatment A B C Total 

Cockerels 14.10 (5) 22.50 (9) 33.00 (12) 69.60 (26) 
Pullets 19.90 (10) 10.98 (6) 5.46 (3) 36.34 (19) 
Total 34.00 (15) 34.48 (15) 38.46 (15) 105.94 (45) 


If a pseudo-variate of one is used for each of the cockerels and zero for 
each of the pullets, the analysis of covariance may be set out as in Table 2. 
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TABLE 2 
ANALYSIS OF COVARIANCE 
Sum of squares 
S.S. of 
pseudo-variate 
Treatments 2 0.9992 0.9795 1.6444 
Error 42 9.0534 7.4107 9.3334 
Total 44 10.0526 8.3902 10.9778 
Regression Deviations from regression 
d.f. S.S. d.f. S.S. 
Treatments 2 0.4708 
Error 1 5.8841 41 3.1693 
Treatments + 
Error 1 6.4125 43 3.6401 


However this analysis may be alternatively set out as in Table 3. 


TABLE 3 
d.f. S.S. 
Sex 1 6.4125 
Total (eliminating sex) 43 3.6401 
Total 44 10.0526 
Sex (eliminating treatments) 1 5.8841 
Error (eliminating sex and treatments) 41 3.1693 
Error (eliminating treatments) 42 9.0534 
Treatments (eliminating sex) 2 0.4708 
Error 41 3.1693 
Total (eliminating sex) 43 3.6401 
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This analysis, apart from rounding-off errors, agrees with the analysis 
given by Yates, although in order to complete the analysis the interac- 
tion of sex and treatments must be tested by removing the joint effect 
of sex and treatments, given by treatments + sex (eliminating treat- 
ments), from the between class sum of squares. This has been done in 
Table 4. 


TABLE 4 


CoMPLETED ANALYSIS 


d.f. S.S. m.s. 
Treatments (eliminating sex) 2 0.4708 0.2354 
Sex (eliminating treatments) Z 5.8841 5.8841 
Sex and Treatments 3 6.8833 
Interaction 2 0.1039 0.0520 
Between classes 5 6.9872 
Error 39 3.0654 0.0786 
Total 44 10.0526 


The interaction sum of squares in this case is negligible so that the basic 
assumption of an equal effect of sex in the three treatment groups is 
justified, but if the interaction was not negligible, further analysis would 
be necessary. This might be carried out by using three pseudo-variates, 
a, b, and c, which take values one for each of the cockerel treatment 
groups, in turn, and zero elsewhere, so that treatments (eliminating sex 
and sex-interaction), and sex and sex-interaction (eliminating treat- 
ments) are estimated. This analysis, although lengthy, is shortened by 
the fact that the error terms in the sums of products of the pseudo- 
variates are all zero so that each variate acts independently.’ 

However this is not true for the totals so that it becomes necessary 
either to solve three simultaneous equations, or to carry out an analysis 


1Cross-checks are also provided by adding the covariance analysis of the pseudo-variates with the 
observations to obtain the analysis of covariance given in Table 2. 
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on sex eliminating the effect of treatments and their interactions by 
using four pseudo-variates. This latter method is exactly equivalent 
to the method of weighted squares of means. 

The test of the difference between any pair or group of treatments 
can be carried out by the formulae and methods of the analysis of covari- 
ance in the usual manner. For example, the adjusted mean difference 
between treatments 1 and 2 is 


7.4107 


9 za107 | = 0.2464 


| 34.00 - 33.48 + 4 X 


and its variance is 


0.0786 
[1s + 15+ | = 0.1108 


A second example of the same method is provided by the experiment 
given in Table 5. This experiment was designed to test nine treatments 
in three blocks, but a large and unaccountable trend showed across the 
blocks.” 


TABLE 5 
Treat- Treat- Treat- 

Block ment | Yield,y| ment | Yield, y| ment | Yield, y| Total 
1 3 4.6 7 6.4 + 11.4 
2 4.5 9 9.5 1 11.1 

8 9.2 5 11.4 6 12.9 81.0 
2 9 9.1 4 11.5 8 15.1 
7 8.3 6 13.2 3 11.3 

2 5.3 1 11.3 5 12.0 97.1 
3 5 4.8 7 13.7 2 10.3 
6 3.4 8 11.9 9 12.9 

3 2.6 1 8.6 4 11.2 79.4 

Total 51.8 97.5 {108.2 | 257.5 


*This example should be compared with that given by Cochran [4]. 
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Although there is some doubt as to whether the conditions of the 
analysis of variance are satisfied it is interesting to see how this may be 
analysed using pseudo-variates x, , which takes values one in column 1 
and zero elsewhere, and x, , which takes values one in column 2 and zero 
elsewhere. The analysis of covariance for this design is given in Table 6. 


TABLE 6 
d.f. 8.8. 8.p. S.S. S.p. 

Blocks 2 21.30 0.00} 0.00 0.00 0.00 | 0.00 
Treatments 8 92.49 —8.83) 6.94 1.33 —1.00 1.33 
Error 16 193.46 | —25.20| 4.73 4.67 —2.00 | 4.67 
Total 26 307.25 | —34.03) 11.67 6.00 —3.00 | 6.00 
Treatments + 

Error 24 285.95 | —34.03) 11.67 6.00 —3.00 | 6.00 

d.f. S.S. m.s. 

Treatments 8 38.87 4.86 
Error 14 47.73 3.41 
Treatments + Error 22 86.60 
Columns 2 199.35 
Blocks 2 21.30 
Total 26 307 .25 


The values involved in this analysis of covariance are easily calculated 
and the whole process is less tedious than the matrix inversion that would 
be necessary if constants were fitted. It is again possible to test particu- 
lar sets of treatments. For example, the main purpose of this experi- 
ment was to compare plots receiving no lime (treatments 1, 2, 3) with 
those receiving a dressing of limestone (treatments 4, 5, 6) and a dressing 
of slag (treatments 7, 8, 9). This may be done by the analysis of co- 
variance as shown in Table 7. 
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TABLE 7 
d.f. S.p. as. S.S. 
Lime 2 44.95 —2.46| 2.95 0.22 —0.11 | 0.22 
Lime + Error 18 238.41 | —27.66) 7.68 4.89 —2.11 4.89 
d.f. S.s. m.s. Variance ratio 
Lime 2 29.62 14.81 4.36 
Error 14 47.73 3.40 
Lime + Error 16 77.35 
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Under the discussion of ‘‘A Quantitative Theory of Genetic Recom- 
bination and Chiasma Formation” by R. A. Fisher, the parenthetical 
comment, ‘‘(Such a notation was presented orally, but is omitted from 
the written proceedings subject to publication elsewhere.)’’, does not 
apply to the discussion by Alexander Weinstein but to that of J. 
Lederberg. 
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ON A FORMULA FOR THE PREDICTION 
OF CRANIAL CAPACITY 


C. RADHAKRISHNA Rao 
and 


D. C. SHaw 


Duckworth Laboratory, 
Cambridge, England 


INTRODUCTION 


Ov of the uses of the regression equation is for the prediction of 
the dependent variate for a given set of concomitant variates. 
For instance, a skull may be broken so that the actual cranial capacity 
could not be determined. In such a case the capacity may be capable 
of being predicted, if at least some external measurements are available. 
This requires the construction of the regression equation between the 
cranial capacity and the observed set of the external measurements on 
the skull. 

Various formulae have been constructed for this purpose and the 
most widely used are those by Isserlis (1914), Hrdlitka (1925), Hooke 
(1926), ete. 

In this article we suggest a new formula for the regression equation 
and derive the constants from the measurements given by Hooke (1926) 
for 86 male skulls excavated from the Farringdon Street, London. 

The statistical methods used in comparing various formulae have 
been presented in full for the convenience of biometricians who may be 
interested in such studies. 


The regression equation. 


Three important measurements from which the cranial capacity (C) 
could be predicted are the glabella-occipital length (Z), the maximum 
parietal breadth (B) and the basio-bregmatic height (H’). Since the 
magnitude to be estimated is a volume, it is appropriate to set up a 
regression formula of the type 

C=a’' L*® BP 
where a’, 8, , 8. and 8; are the constants to be estimated. Transforming 
the variables to 

y = logio = logio L, = logio B, = logio H’ 
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the formula can be written as 
= at Bir, + Bete + 


where a = log, a’. Starting from this equation we propose to estimate 
the constants by the method of least squares. 


Estimation of the constants. 


Using the measurements on the 86 male skulls from the Farringdon 
series we find the mean values 


y = 3.1685, 2%, = 2.2752, Z2 = 2.1523, %; = 2.1128. 


The corrected sum of the product matrix (S,;) for 2, , Z2 , 23 is 


.01875 .00848 .00684 
.00848 .02904 .00878 
.00684 .00878 .02886 


The corrected sum of products of y with x, , x. and x; are respectively 


Q, = .03030, Q. = .04410, Q; = .03629 


The reciprocal of the matrix (S;;) obtained by the c-matrix method of 
Fisher is 


64.21 —15.57 —10.49 
—15.57 41.71 — 9.00 
—10.49 — 9.00 39.88 
The estimates of the parameters are 
b, = 64.21 Q, — 15.57 Q. — 10.49Q; = .878 
be —15.57 Q, 41 | 9.00 Q; = 1.041 
bs = —10.49Q, — 9.00 Q, + 39.88 Q; = .733 


The formula for the prediction of cranial capacity’ is 
Tests of hypotheses.” 


1The capacities of Farringdon series skulls were determined by tight packing with mustard seed 
and weighing in the manner described by Macdonell (1904). The formula is strictly applicable for 
predicting capacities determined in this way. 

2The general theory of tests of linear hypothesis is discussed by one of the authors in (Rao, 1946). 
For exact tests of significance it is necessary that the residuals should be normally distributed. It has 
been shown by various writers that the analysis of variance tests hold good provided the departure 
from normality is not large. 
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Having estimated these constants it is relevant to examine how far 
the concomitant variables are helpful in prediction. If these variables 
are of no use then the prediction formula does not depend on them so 
that 8, = 8. = 8; = 0. This hypothesis may be tested from the above 
data. 

The residual sum of squares with (n — 4) d.f. is the minimum value of 


(y — a — — Bats — Bsts) 
which is 
= ny’) — b,.Q, — b.Q. — 


(1) 12692 — .878(.03030) — 1.041(.04410) — .733(.03629) 


.12692 — .09911 = .02781 


If the hypothesis 8, = 6. = 6; = 0 is true then the minimum value 
of >> (y — a)’ is >> yy? — ny’ = .12692 which is the total sum of squares 
with (n — 1) df. The reduction in the sum of squares (1) is due to 
regression. The analysis of the sum of squares is shown below. 


TABLE 1 
TEST OF THE HyporTuEsis 8; = Bo = B3 = O 
d.f. S.S. M.S. F 
Regression 3 .09911 .033037 97.41 
Residual 82 .02781 .0003391 
Total 85 . 12692 


The variance ratio 97.41 with 3 and 82 degrees of freedom is significant 
at 1% level which shows that the variables considered above are useful 
in prediction. 

It may now be examined whether the three linear dimensions appear 
to the same degree in the prediction formula. From the estimates it is 
seen that the index b, for maximum parietal breadth is higher than the 
others. This means that a given ratio of increase in breadth counts more 
for capacity than the corresponding increase in length or height. 

The hypothesis relevant to examine this point is 


= = Bs = B (say) 


| 
| 
= 
| 
| 
| 
| 


250 BIOMETRICS, DECEMBER 1948 
The minimum value of )> {y; — @ — B(x; + 22: + 23:)}? has to be 
found out. The normal equation giving the estimate of 6 is 


Si + Soo + Ss3 


b=Q=(Q4+ 2. + Q:) 
+2(Si2 + + 


.12485 b = .11069 
b = .8866 


The minimum value with n — 2 df. is 


(> y? — ny’) — bQ = .12692 — .09814 = .02878 


TABLE 2 
TEsT OF THE HyporHEsIS 8; = Bo = B3 
d.f. S.S. MSS. F 
Deviation from 
equality 2 .00097 .000485 1.430 
Residual 82 .02781 .0003391 
Total 84 .02878 


The ratio is not significant so there is no evidence as judged from the 
data to conclude that the 6’s are different. The difference, if any, is 
likely to be small and a large collection of measurements may be neces- 
sary before anything definite can be said about this. Evolutionists 
believe that the breadth is increasing relatively more than any other 
magnitude on the skull. If this is true it is of interest to examine how 
far the cranial capacity is influenced by the breadth. 
So far as the problem of prediction is concerned the formula 


C = .002342 (L 


obtained by assuming 8, = 8, = 8; may be as useful as the formula 
derived without assuming that these are equal. The variance of the 
estimate b of B is o””/ >> >> S,; where o” is the estimate based on 84 d.f. 
with the corresponding sum of squares given in Table 2. 

A simple formula of the type C = a L B H’ is sometimes used for 
predicting the cranial capacity. A test of the adequacy of such a formula 
is equivalent to testing the hypothesis 6, = 6, = 8; = 1. 
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The minimum value of }> (y — a — 8,2; — Bet, — Bs%3)* assuming 


this to be true is 
(> yi — ny’) + Sir + Soo + Sos + 2Si2 + 2Sis + 


— Q, — Q — Q; = .14108 


which has (n — 1) degrees of freedom. The residual has (n — 4) degrees 
of freedom so that the difference with 3 d.f. is due to deviation from the 
hypothesis. 


TABLE 3 
TEST OF THE HypoTtuEsIS 8; = = = 1 
d.f. SS. M.S. F 
Deviation from 
Bi 3 .11327 .03742 110.3509 
Residual 82 .02781 .0003391 
Total 85 . 14108 


The ratio 110.3509 with 3 and 82 d.f. is significant at 1% level. This 
shows that the prediction could be bettered by suitably choosing the 
indices. 


In the above table the sum of squares due to deviation from the 
hypothesis could be directly calculated from the formula. 


— 1)(b; — 1) 
= — — 1) + — 1) + Sia(bs — 1} + 
= — — Sir — Si2 — Sis) + — — Sar 
— S22 — + (bs — 1)(Qs — — — Sas) 
+ b:Q2 + YS: 
.09911 — .11069 + .12485 


ll- 


11327 


which is the same as that given in Table 3. 
Having found that the 8 coefficients individually differ from unity, it 
is of some interest to examine whether the indices add up to 3 while 
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distributing unequally among the three dimensions used. This requires 
the test of the hypothesis 8, + 8. + 8; = 3. The best estimate of the 
deviation is b, + b, + b; — 3 = 2.652 — 3 = —.348 

with its variance 


(> Dd = 75.68 
The ratio with 1 and 82 d.f. is 


(.348)? 1 

75.68 .0003301 ~ 

which is significant at 5% level. This shows that the number of dimen- 
sions of the prediction formula is not 3. 


The use of the formula for a single skull. 


A skull with ZL = 198.5, B = 147, H’ = 181, ie., 7, = 2.298, x. = 
2.167, x; = 2.117 will have the estimated log capacity as 


y= bia, — + — + — Fs) 
= 3.2069 
C = anti-log 3.2069 = 1610. 


= (.04187) 


= .0001420 using the estimated value of o* 


V(C) = C’V(y) approximately 
= 195.2 


The covariance of b; , b; is the element in the z-th row and j-th column of 
matrix reciprocal to S,; . 


The use of the formula in estimating the mean capacity. 


The formula can also be used to estimate the mean cranial capacity 
of a series of skulls. For this purpose two methods are available. We 
may estimate the cranial capacity of individual skulls, and calculate the 
mean of these estimates, or we may apply the formula directly to the 
mean values of L, B, H’ for the series. It is of interest to know if these 
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two methods give the same results. For this purpose estimates were 
made of the mean cranial capacity of a further 29 male skulls of the 
Farringdon Street series for which measurements of L, B, H’ but not 
of C were available. 

For these 29 skulls the mean of L is 191.1 mm., of B is 143.1 mm. and 
of H’ is 129.0 mm. 

Applying the formula C = .00241 to these mean 
values we estimate the mean of C to be 1498.3 ccs. If we estimate C 
for the 29 skulls individually and take the mean of the 29 estimates we 
get an estimate of the mean value of C equal to 1498.2 ccs. 

The same estimates were calculated for the 22 male skulls of the 
“Moorfields” series (Hooke, 1926) for which all four measurements were 
available. For these 22 skulls the mean of L is 189.5 mm., of B is 142.5 
mim. and of H’ is 128.8 mm., giving an estimate of the mean of C equal 
to 1479.0 ccs. If we estimate C for the 22 skulls individually and caleu- 
late the mean we get an estimate of the mean of C equal to 1480.0 ces. 
Thus it appears that the two methods give very nearly the 
same estimates. 


Are only small skulls preserved? 


A point of some interest is that whereas the mean value of C for the 
Farringdon Street series as calculated from 86 measured values is 1481.3 
ecs., the mean value of C as estimated by our formula from the 29 skulls 
for which measurements of L, B, H’ but not of C are available is 1498.3 
ces. 

Again, for the Moorfields series, the mean of L is 189.2 mm. based on 
44 measurements, of B is 143.0 mm. based on 46 measurements, and of 
H’ is 129.8 mm. based on 34 measurements. Applying our formula to 
these mean values (as we may do with some confidence after the results 
in the last section) we obtain an estimate of the mean of C equal to 
1490.7 ccs. The mean of C as calculated from 22 measured values is 
1473.8 ces. 

The above results suggest that those skulls which are damaged to 
such an extent that the cranial capacity cannot be measured are on the 
whole larger than those that remain intact. 
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THE COMPONENTS OF GENETIC VARIANCE IN 
POPULATIONS OF BIPARENTAL PROGENIES AND 
THEIR USE IN ESTIMATING 
THE AVERAGE DEGREE OF DOMINANCE* 


R. E. Comstock anp H. F. Ropinsont{ 


pa phenotypic expression of a character can be considered the 
sum of a genetic effect and a deviation attributable to environ- 
ment and interaction between the genotype and environment involved. 
Symbolically, 


p=yte 


where p is the phenotype; y, the genetic effect; and e, the deviation of 
p from y. Then, if genotypes are randomly distributed relative to 
variations in environment, phenotypic variance is 


(1) 


where o; is variance of genetic effects, and o? is the portion of o; resulting 
from variation in environment. 
Wright [9] defined three components of o; as follows: 


1. Additive genetic variance 
2. Variance due to dominance deviations from the additive scheme 
3. Variance due to epistatic deviations from the additive scheme. 


These components will be symbolized in what follows as o; , o; , and 
o. , respectively. Assuming that either of two allelic genes (B and b) 
may occupy a given locus, the genotype of a diploid organism with 
respect to that locus may be BB, Bb, or bb. In the absence of dominance 
the effect of the heterozygous genotype is the average of the effects of 
the other two genotypes, i.e. 


Ya. = (Yer + Yor) /2 and Yes Yo. Yar — You - 


If Ysa — Yao — Yor.» 


there is dominance and there will be variance due to dominance devia- 
tions. 


*Contribution from Institute of Statistics, of The University of North Carolina, Raleigh, and the 
North Carolina Agricultural Experiment Station. Journal paper No. 297. 
tProfessor and Associate professor, respectively. 
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The term, epistasis, as used by Wright and others following his usage, 
covers all types of interactions among non-allelic genes. If a character 
is conditioned by genes at n loci and there is no epistasis, 


where y;, is the effect of the genotype at the first locus, y, is the effect of 
the genotype at the second locus, etc. Then o; = 0, ie., there is no 
epistasis. If, in addition, there are no linkages among the n loci or if 
there are linkages but the distribution of genotypes is at equilibrium, then 


n n n 
2 2 2 2 2 2 2 2 
1 1 1 


where o;, is the additive genetic variance and oj, , the variance due to 
dominance deviations resulting from segregation at the 7th locus. This 
is the genetic model (no epistasis and either no linkage or the equilibrium 
distribution of genotypes with respect to linked loci) to be considered 
herein. 

Though Wright defined them in slightly different terms, the additive 
genetic variance for the ith locus may be defined as the portion of the 
variance of genetic effects explained by regression on the number of B 
genes in the genotype (or, alternatively, the number of b genes) and the 
variance due to dominance deviations as the variance of deviations of 
the genetic effects from that regression.” Let x be the number of B’s 
in the genotype, g be the frequency of B in the population, and (1 — q) 
the frequency of b in the population. Then, for a population under 
random mating, the distribution of genotypes for a single locus will be as 
tabulated below. 


Genotype Frequency z y y’ 
BB ¢ z+ 2u u 
Bb 2q(1 — q) 1 z+tut+au au 
bb (1 — q)? 0 2 =u 


The symbols, u and a have the significance of d and h/d, respectively, 
in the notation of Fisher et al. [3]. The y’ values are coded y values 
obtained by subtracting (z + u) from each. Note that a is a measure of 
dominance; it equals zero when dominance is absent and increases in 
magnitude as y,, deviates from the midpoint between ygz and ys, . 
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Working from the above table we find that 


(2) = 2q(1 — g)[1 + 2(1 — 2g)a + (1 — + 
Cov,,, = — + (1 — 
o, = 2q(1 — 
2 
(3) 
+ (1 — 4q + 4¢°)a"}u’ 
and finally 


Expressions for o7, and o3, are equivalent to those derived by Wright [9]. 


Estimation of Average a Based on the Composition of Variance in Popula- 
tions of Biparental Progenies 


Let mn females be taken from a population produced by random 
mating and let them be mated, n to each of m males taken from the same 


TABLE 1 
ANALYSIS OF VARIANCE OF A POPULATION OF BIPARENTAL PROGENIES 


Source of variance d.f. m.s.| Expectation of ms. 
Males m—1 M, o° + ko} + + nrko’, 
Females in males m(n — 1) M,| + koe + 


Plots in females in males} mn(r — 1) | M; | o° + 


Within plots mnr(k — 1) | M,| 


Total mnrk — 1 


Key To TaBLe 1 


o? is the sum of the intra “plot” environmental variance and the genetic variance 
among individuals of the same progeny. 

o*, is the variance of “plot”’ effects. 

o*; is the variance of female effects, and 

oc’, is the variance of male effects 
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population. Assume random choice of all individuals involved in the mn 
matings. Let rk offspring from each mating be grown, k in each of r 
“plots” assigned at random. The variance among the individuals of the 
mn matings can be partitioned as indicated in Table 1. The expectations 
of the mean squares (See Crump [2]) are also indicated. 

The four variance components can all be estimated from appropriate 
mean squares. For example, 


Under the hypothesis of random mating the expected frequencies of 
various types of matings, relative to a single pair of allelic genes, will be 
as listed below. Expected mean y’ values of progeny are listed for each 
mating. 


Mating 
Male Female Frequency mean y’ 
BB BB u 
Bb 2q°(1 — q) (u + au)/2 
bb @(l — q)? au 
Bb BB 2q¢(1 — q) (u + au)/2 
Bb 4q2(1 — q)? au/2 
bb 2q(1 — q)3 (au — u)/2 
bb BB @(l — q)? au 
Bb 2q(1 — gq)? (au — u)/2 
bb (1 — 


From this the frequencies and expected mean y’ values of progeny for 
the three types of males are found to be as follows: 


Male Frequency mean y’ 


BB ¢ qu + (1 — g)au 
Bb 2q(1 — q) (2q 1) 


“to 
bb qau — (1 — g)u 
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The grand mean of y’ for all progenies is 


(2g — 1)u + 2q(1 — g)au 


The expected genetic variance (from the pair of alleles under considera- 
tion) among means of the progenies of different males is 


2 2 


+ (1 — — (1 — — [(2q — + — g)au)’ 


= + 20 29a t (1 49 + 
which by reference to equation (3) is seen to be equal to a;,/4. Since the 
genetic model assumed (a) no epistasis and (b) the equilibrium state 
relative to distribution of linked genes, i.e., no correlation among effects 


of genotypes at different loci, the total genetic variance among male 
progeny means is 


(5) 


The expected genetic variance of means of progenies from different 
females but the same male is 


— + (1 — gau]? — + 
— (1 — — (1 — 


[1 + 2(1 — 2g)a + (1 — 2g + 


which by reference to equation (2) is seen to be equal to o7,/4 or 0%,/4 + 
o;,/4. Thus the total genetic variance among progenies of females 
within males is 


(6) tLe tt Lok = od) 
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If there is no maternal influence and if the progenies have been 
assigned to blocks at random, o>, and o; will contain only the genetic 
variance among male progeny means and the genetic variance among the 
means of progenies of females within males, respectively. Then from 
(5) and (6) 


(7) = o,/4 
and 
(8) o; = jo, + 


The sum of o%, and a; is the total of the additive genetic variance and 
variance due to dominance deviations among full sib families, $03 + 403, 
as reported by Wright [9]. 

Provided that q has the same value for all gene pairs, estimates of 07 
and go; furnish information on the magnitude of a. If the population of 
progenies described above were in the F; generation of a cross between 
two isogenic lines, it could be assumed that g = .5 for all loci at which 
there was segregation. From (3) and (4) when g = .5 


Then 
=} Dani, 
and 
& 


where a’ is a weighted mean of the a”’s for all loci, the individual a”’s 
being weighted relative to the u”’s for the corresponding loci. In what 
follows (a’)'”” will be symbolized as “‘a’’. 


From (7) and (8), we have 
2 2\\1/2 2\1/2 
(10) = (74) = 


Thus given data on the sort of population described above, estimates of 
a; and a; can be made and these used to estimate “‘a’”’. 
While “a” > @ unless all a’s are equal, the bias in “a” as an estimate 
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of @ cannot be very great unless the a’s vary a great deal’ and “a’”’ cannot 
exceed unity unless one or more of the a’s are larger than one. Therefore 
if the estimate of “a” is significantly greater than one, it can be con- 
cluded that one or more a is greater than one, i.e., that there is over- 
dominance of genes at one or more locus. 

Aplightly different sort of population might be set up if the organism 
is one in which a female parent can produce progenies by more than one 
male. Suppose that from each of n females, progenies are obtained by 
each of m male parents. (This is possible, for example, with multi- 
flowered plants.) The result would be a set of nm progenies, one from 
the mating of each of the m males with each of the n females. Assume 
s such sets and that k members of each progeny are grown in each of r 
“plots”. The form for the analysis of variance data so obtained is 
shown in Table 2. 


TABLE 2 
ANALYSIS OF VARIANCE FOR DaTA OBTAINED FROM PROGENIES PRODUCED BY 
Martine Eacu or A Serres oF MALES TO Eacu OF A SERIES OF FEMALES 


Source of variance d.f. m.s.| Expectation of mean square 


Sets of progenies |s — 1 
Males in sets s(m — 1) My o° + + + rkno’, 


Females in sets s(n — 1) M, | o° + ko} + rko;, + rkmo; 


~ 


Males X females 


in sets s(m — 1)(n — 1) | M2} + kop + rkoim 
Plots in progenies | smn(r — 1) M; | o + ko} 
Within plots smnr(k — 1) M,\ 0 

Total smnrk — 1 


o*, is the variance of effects due to interactions among males and females 
Other symbols have the same significance as in Table 1. 


1The bias would become very large if a were positive for some loci and negative for others. Clearly 
it is the average absolute magnitude of a that is being estimated since a? is positive whether a is positive 
or negative. 
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Again assuming random choice of parents from the population 
available and random allotment of progenies to “plots”, 2 , 07 , and om 
will be entirely genetic in origin. Proceeding as before it can be shown 
that in this case 


on = = 0,/4 
and 
Crm = 03/4 


Thus from (9) we have 


2 1/2 2 1/2 2\ 1/2 
(11) (24) = (74) = “a” 
on Of 


Given the same total number of progenies, o; and oj , and hence “a”, 
will be estimated more precisely by data from this sort of population 
than by data from the first type of population described. The most 
obvious advantage is in the fact that in this case the estimate of oj is a 
function of only M, and M; , both of which will have relatively many 
degrees of freedom, whereas in the other case it is a function of M, , M,, 
and M,. 


Numerical Example 


The data to be used is taken from a study with corn reported by 
Robinson et al. [7]. Parent plants were selected at random from the F, 
generation of crosses between long inbred (essentially isogenic) lines. 
Individual plants used as pollen parents (males) were each mated to 
four seed-producing plants (females). Data on progenies produced in 
one of the crosses will suffice to exemplify the estimation procedure. 

The analysis of variance in Table 3 is on yield of grain of 192 progenies 
produced from matings of 48 males with 192 females. The 192 progenies 
are comprised of 48 male groups; the four progenies of a group had a 
common male parent but each progeny had a different female parent. 
The field lay-out was in blocks containinz 32 plots each. The material 
was divided into 12 sets of 4 male groups each and each such set of 16 
progenies was assigned to a block and replicated twice within the block. 
A new randomization of the 16 progenies was made within each of the 
two replications. Each plot consisted of two 10 plant rows. Yield was 
measured in pounds of grain produced by 10 guarded plants (plants 
having another plant on each side in the row). 
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TABLE 3 
ANALYSIS OF VARIANCE OF GRAIN YIELD OF 192 BIrPARENTAL PROGENIES 


Source of Variance | d.f. m.s. m.s. expectations (see Table 1) 


Blocks 11 | .153 
Replications in blocks | 12 | .063 
Males in blocks 36 | .167 = M,| o? + 1003 + 200; + 800; 


Females in males in 
blocks 144 | .069 = M,} + 1003 + 2007 


Males X Replications 
in blocks 


Females in males X 
Replications in blocks) 144 


36 
.031* = M;| + 100; 


Within plots 207 | .0153T o 


*The two interaction mean squares were pooled for a single progeny x replication mean square. 
This is rather common procedure. The underlying assumption 1s that intra-block genotype x replication 
interaction was unimportant in magnitude. The point was supported by the fact that the Males x 
Rephcation mean square was actually the smaller. 

tThis estimate was obtained from data on only a portion of the individual plants. Individual 
plant data were taken only in about every 12th plot. 


From the mean squares in Table 3 variance components are estimated 
as follows: 


Variance Component Estimate 
om (M, — M,)/80 = .001225 
a; (M, — M;)/20 = .0019 
o> (M; — M,)/10 = .00157 
o M, = .0153 
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Then using the estimates of o;, and o; in accordance with equation (10) 
we obtain 


2(.0019 — 01795)" 
( .001225 


as an estimate of ‘a’. 

The above data also furnish the material for estimating heritability. 
Heritability is the additive genetic variance as a fraction of the pheno- 
typic variance. Hence, since = , 


4(.001225) 
.001225 + .0019 + .00157 + .0153 


is an estimate of the heritability of the variance among individual plants. 
Numerous estimates of the heritability of animal characteristics have 
been made in this way (for an example see Hetzer et al. [4]).’ 


.245 


Significant Test for the Deviation of the Estimate of ‘‘a’’ from any Hypo- 
thetical Value 


An estimate of “‘a’”’ will have its greatest value only if the probability 
that its deviation from critical hypothetical values is of chance origin 
can be established. If it is significantly greater than zero, a degree of 
dominance in the action of genes conditioning the character in question 
is indicated. If it is significantly greater than one, over-dominance is 
indicated. 

An approximate F test*® can be made in the following manner. Con- 
sidering the variance analysis for the first type of population discussed, 
the expected value of M, assuming a specific value of ‘‘a’” can be esti- 
mated as a linear function of M, and M;. From equation (10), 


20; 
= 2 (‘a’)? 


Then M3 , the estimate of the expected value of M, , is 


2 + (‘a’)? ] 2n ] 
(12) + Qn M, + 2+ (**a”)? + Qn M; 


2Estimates made in this manner from data obtained at one location in a single year will over- 
evaluate genetic improvement possible through selection if there are important interactions of genotype 
with the variations in environment that occur from year to year and between locations within the 
area in which an improved strain or variety is to be used. Such interactions are known to be important 
for many plants and estimates of their magnitudes are needed. 

3Suggested by W. G. Cochran of the Institute of Statistics of The University of North Carolina, 
Raleigh. He states that the test outlined is reasonably good but that work directed toward finding a 
more precise one is in progress. 
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in which “a” is given the hypothetical value against which the estimated 
value is to be tested. The test is based on a comparison of M4 with the 
observed M,. Three types of cases should be distinguished. 


1. The object is to test whether the estimate of “a” is significantly 
larger than a specified hypothetical value. For example, in testing 
for over-dominance we wish to know if our estimate of “a’’ is sig- 
nificantly larger than 1.0. Since the expected value of M, increases as 
“a” becomes larger, F will in this case be computed as M,/M} . 

2. The object is to test whether the estimate of “a” is significantly 
smaller than a specified hypothetical value. For example, if we want 
to establish that there are loci at which there is either no dominance 
or only partial dominance we are concerned with whether our esti- 
mate of “a” is significantly smaller than 1.0. Because smaller values 
of ‘‘a’”’ mean smaller expected values of M, , F must in this case be 
computed as M!/M, . 

3. The object is to test whether the estimate of “‘a’’ deviates significantly 
(in either direction) from a specified hypothetical value. In this case 
F is taken as M}/M, if Mj > M, andas M,/M; if M, > M}. 


In cases 1 and 2 the probability of F is taken from the standard F table 
in the usual way. In case 3, a two-tailed F test is involved and hence the 
probability from the F table must be doubled. Degrees of freedom are 
assigned M3; in the manner described by Satterthwaite [8]. However 
since quantities such as M} are not distributed precisely like ordinary 
mean squares, the test is clearly not an exact one. 

It may be noted that an alternative procedure involving comparison 
of M, with an estimate, M{ , of its expected value based on M, and M, 
could be used. However, the expression for-M{ would involve the differ- 
ence between M, multiplied by a constant and M, multiplied by another 
constant, and it is known from theory that the distribution of a linear 
function of mean squares deviates further from that of a single mean 
square when the function involves differences than when it does not. 

Using the data of table 3 in testing the divergence of the estimate, 
1.05, of ‘‘a” from zero we find 


(2) 
= (5240) 167 4 24048) 08! = 


Degrees of freedom assigned E(./,) are 
+ 


= 98 


cM; 


fi fs 
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where c, and c; are the coefficients of M, and M; in (11) and f, and f, are 
the degrees of freedom of M, and M, . 
F = 1.19 which is non-significant for 144 and 98 degrees of freedom. 
In the case of the analysis of variance in table 2 the test of ‘‘a” 
would be an F test of M, against its expected value based on M, , M, , 
and M,. If mand n were equal M} would be as follows: 


(‘‘a’’)? M,+M, 2n 
+ All 2 | law + 


When the deviation of “a” from zero is being tested this reduces to M, 
and the test becomes the exact test of F = M,/M,. 

It should be noted that a rather large amount of data is required for a 
reasonably good estimate of ‘‘a’’. Robinson et al. [7] in the study referred 
to above had 146 and 518 degrees of freedom for M, and M, , respec- 
tively, yet an estimate of 1.64 had a probability of about .05 assuming 
the true value of “a” was 1.0. If it had been possible to utilize the al- 
ternative type of population described and this had been done using 
sets of 16 progenies produced by mating each of 4 females with each of 
4 males, the probability of “a” = 1.64 being greater than 1.0 would have 
been below .01 assuming that all variance estimates had turned out the 
same. 


DISCUSSION 


The foregoing has assumed (a) no epistasis, and (b) equilibrium with 
respect to segregation of linked genes. In many, if not most, instances 
neither assumption will be strictly valid. It appears to the authors that 
the presence of genetic variance due to epistatic deviations will cause 
upward bias in the estimate of ‘‘a’’ because this variance is distributed 
among the mean squares in somewhat the same manner as that due to 
dominance deviations. The probable size of such bias is not clear at this 
time. It is known that the epistatic variance arising from some types 
of non-allelic gene interactions is small relative to o7 but that in certain 
genetic situations it can be large, Lush [6]. Further, in the case of at 
least some types of epistasis the fraction of the epistatic variance con- 
tained in o; (or a7», in the case of the second type of population discussed) 
is considerably smaller than that for ¢; . Further work is needed in 
evaluation of bias possible as a consequence of epistasis and in devising 
techniques for determining when a serious amount of epistatic variance 
is involved. 

Coupling phase linkages would not be a source of bias in estimates of 
a’. On the other hand, the net effect of genes tightly linked in repul- 
sion could be the same as for over-dominance in the action of indepen- 
dently segregating genes even though none of the linked genes were 
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individually more than partially dominant to their alleles. As a result 
it would appear that estimates of “‘a’”’ > 1.0 might well be obtained from 
data collected in an early generation following a cross of genetically 
divergent material even though there were no over-dominance in the 
action of the individual gene pairs. In that event somewhat smaller 
estimates would be expected in later generations, though it is possible 
that this linkage effect would always be present to a degree. However, 
whether apparent over-dominance is an attribute of the individual gene 
pairs or a consequence of linkage it poses problems relative to methods 
for genetic improvement, Hull [5], and Comstock et al. [1]. One differ- 
ence will be that, if what is measured as overdominance results largely 
from linkage, information on its magnitude must be interpreted in the 
light of the history of the material from which that information was 
obtained and applied with consideration to the history of the material 
on which one is attempting to work genetic improvement. 


SUMMARY 

Two sorts of populations of biparental progenies from which data 
can be used for the estimation of the degree of dominance in the action 
of genes have been described. The mathematical basis for the estima- 
tion methods have been presented in detail together with some discussion 
of limiting assumptions involved. A numerical example of the arith- 
metic procedures is given and an approximate test of significance of the 
estimate of dominance is outlined. 
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QUERIES 


QUERY: A problem was recently referred to me for criticism and 
61 it now seems that a further opinion is needed. I would appreciate 
your apparisal of the problem. Here is the problem as presented 

to me. 

In a cattle feeding experiment, twelve rations were tried on 12 lots of 
animals in each of two replications. There were two animals per plot. 
The twelve rations consisted of all combinations of four winter rations 
and three summer rations. Table 1 is the analysis of variance presented 
to me. 


TABLE 1 
ANALYSIS OF VARIANCE SUMMARY 

Source of Variation D/F MS F 
Total 47 
Btw. winter rations 3 9,388.06 1.02 
Btw. summer rations 2 3,308.77 2.88 
Btw. reps 1 28 , 226.97 1.64 
WxXs 6 9,528.99 4.85* 
W X Reps 3 1,862.38 24.81* 
S X Reps 2 52,581.43 1.14 
W XS X Reps 6 46,198.99 4.00** 
Error 24 11,558.92 


My opinion is that the experiment as set up has 24 plots (only 2 reps 
are accounted for in the summary above) and the analysis of variance 
summary should be: 


Total 
Winter 
Summer 
Reps 
WxXs 
WXR 
SXR 
WXSXR 


nN 
WW 


It seems to me that if all 48 animals were considered separate plots, 
then there are 4 reps and the summary of analysis of variance would be: 
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Total 
Winter 
Summer 
Reps 
Wxs 
WxXR 
SXR 
WXSXR 


Third, if each animal is considered to be a sub-sample of a plot value. 
the analysis could be: 


Total 

Winter 
Summer 
Reps 
Sub-samples , 
wxs 

W X Reps 
WxXSS 


WXSXRXSS 6 


From this point of view, the summary originally presented to me was 
satisfactory provided that the difference between sub-samples and all 
interactions in which sub-samples are involved are not significant. From 
a look at the data, this does not seem to be true. Also, the experiment 
apparently does not consider the separate animals as sub-samples. 

I would appreciate your opinion very much. 


None of the suggested analyses appears to be correct. If 
ANSWER: there was some real distinction between the replicates, and 

if the treatments were distributed at random throughout 
each replicate, then the analysis of variance of Table 2 is appropriate. 

The experimental error mean square is appropriate for testing each 
of the effects preceding it in the table. None of the mean squares is 
greater than error so that the treatments are clearly without effect in 
differentiating the gains in weight. 

It may be well to call attention to the fact that, in the table furnished 
you, most of the values of F are suspect. The sets of hypotheses which 
may be tested, together with corresponding computations and methods 
of using the F-table, have been discussed in this column before (Vol. 1, 
page 70; Vol. 2, p. 56). The structure and conduct of this experiment 
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TABLE 2 


ANALYSIS OF VARIANCE OF CATTLE FEEDING Data. 
Ranpom SAMPLING FROM Two Distinct REPLICATES. 


Source of Variation Degrees of Freedom Mean Square 
Replications 1 28 , 227 
Treatments: 

Winter 3 9,388 
Summer 2 3,309 
Winter X Summer 6 9,529 
Experimental Error 1l 35 , 268 
Animals within lots 24 11,559 


indicate the test of the hypothesis that treatments are not effective in 
producing differences among the gains. In making this test on the winter 
rations, for example, the appropriate value of F is 9,388/35,267 = 0.26. 
The F-table shows that, with degrees of freedom 3 and 11, no value of F 
less than 3.59 is significant at the 5% point. In these circumstances, it 
is inappropriate to refer to the tabulated values of F if the sample value 
is less than one; that is, if the treatment mean square is less than that 
for error. 

From the viewpoint of experimental design, it is interesting to ob- 
serve the highly significant (F = 35,268/11,559 = 3.05; Fo, = 3.09) 
intraclass correlation among the gains of the two animals per lot. This 
lack of independence of the gains may be due to the confinement of each 
pair of animals in a common pen. If so, this is a striking illustration of 
one danger inherent in the all-too-common practice of housing together 
animals receiving the same treatment. Other difficulties were dis- 
cussed in query number 60 in the preceding number of this JouRNAL. 


Danie. G. Horvitz 


QUERY: Forty Hereford heifers were divided equally into four 

62 separate lots of ten each, and fed the drugs under study by mixing 
these substances in the grain ration. Each lot of cattle was self- 

fed the grain ration. The breeding efficiency and fertility was studied 
through exposing the heifers in each lot to each of two bulls (5 heifers 
bred to each bull). The breeding program was initiated 149 days after 
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the start of the drug feeding period and continued for 13 weeks at which 
time the heifers were slaughtered. 

It should be pointed out that in Lot II, two heifers were found to 
have abnormal reproductive organs and were diagnosed as freemartins. 
These heifers were never in heat and therefore did not influence the 
number of services per conception. With thése facts in mind, perhaps 
these two animals should not be included in the data and this lot (Lot II) 
be considered as having 8 heifers instead of 10. 

I am wondering if there is some means of analyzing these data in 
order to see if there is significant difference between the number of 
services per conception for the various lots. 

Table 1 gives the information on breeding of each animal and also 
for each lot. 


TABLE 1 
Data ON BREEDING IN Four Lots or HEIFERS 
Lot 1 Lot 2 ' Lot 3 Lot 4 
Arsenic Nux- Thiou- 
Control Trioxide vomica racil 

Pregnancies after 1 service 8 5 8 7 
Pregnancies after 2 services 1 0 0 3 
Pregnancies after 3 services 0 1 1 0 
Not pregnant: 1 service 1 | 0 0 
2 services 0 1 1 0 
Freemartins 0 2 0 0 
Total 10 10 10 10 


The answer to your question is clear, but merely to answer 
ANSWER: it would, i fear, be a disservice because the answer is 

apparently not pertinent to your problem. The answer: 
In each lot you have a frequency distribution showing the numbers of 
pregnancies following 1, 2 and 3 services. Following the customary 
methods of computation in frequency distributions, you can get the 
mean and the variance for each lot. These can then be combined into 
the usual analysis of variance for groups. The method is outlined in 
examples 8.8 and 10.16 of the 4th Edition of my text, and is used in 
query number 61, which precedes this one. I carried through the test 
and got F = 0.24. 
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To see why I think this would be irrelevant, consider two lots, in one 
of which there occurred only a single pregnancy, this following a single 
service; while in the other lot 10 pregnancies resulted from 10 services. 
In each lot there is one service per pregnancy, but I question whether 
you would consider the breeding efficiency or the fertility the same. 

I have assumed that you used the words “pregnancy” and “‘concep- 
tion” synonymously, yet in each of three lots there were heifers not 
pregnant after one or two services. Presumably they were not serviced 
again because they were not in heat. Can it be that conception took 
place followed by abortion? Can this type of historical data be learned 
from the postmortem? 

As for freemartins, I agree that the postmortem dictates their elimi- 
nation from the experiment. In fact, I am surprised that you included 
them originally. 

It is not clear what effect of treatment you wish to evaluate. It 
might be the number of pregnancies following the first service (or the 
second, or the third, or some combination of them), it might be the 
number of sterile heifers or the number of abortions, or it might be some 
combination of all these. It might even be the number of services per 
pregnancy, especially if this were considered in the light of other 
information. 

It seems to me that any result you get may be ambiguous. How can 
you distinguish between the effects of treatments and the effects of bulls? 
The description of this part of the experiment is not clear to me. 

From the foregoing you will see that any exact evaluation of your 
results is difficult if not impossible. However, I am willing to hazard a 
guess! Considering the small size of your samples, the uniformity of 
your results appeals to me as notable. I guess that the treatments are 
not significantly different. 

W. SNEDECOR 


QUERY: Recently, we received data from a field veterinarian 

63 relative to the healing qualities of two drugs. In the data shown 

in Table I, how would you estimate whether or not there is a 

statistically significant difference between the healing qualities of the 
two drugs? 
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TABLE I 
No. of Animals 
Healing Drug A Drug B 

Poor 1 1 
Fair 6 4 
Good 16 6 
Excellent f 5 
Total 30 16 


Assuming random sampling from normal distributions, 
ANSWER: the -test is applicable. If it is further assumed that the 

degrees of healing can be measured by equally spaced 
numbers, it is convenient to assign the integers, 1-2-3-4, to the four 
categories, poor-fair-good-excellent. The means for drugs A and B 
are then calculated from the two frequency distributions. They turn 
out to be 2.97 and 2.94, so nearly the same that no test of significance 
is required. But for illustration, the two sums of squares of deviations 
from means are computed, 16.67 and 12.82, the sum being 29.49. The 
formula for ¢ is, for this group comparison, 


t= (m +m) ) 


where n, and n, are the two sample sizes and )> 2’ is the pooled sum of 
squares. 


Substituting: 


t = (2.97 — 2.94) ( 


(46)(29.49) 


with 44 degrees of freedom. 


GEORGE W. SNEDECOR 


THE BIOMETRIC SOCIETY 


One year after its formation at the First International Biometric 
Conference in Woods Hole, the Biometric Society has a total membership 
of 673 (as of October 22). Most of these are affiliated with an organized 
region, 362 with the Eastern North American Region (ENAR), 103 
with the British Region and 55 with the Western North American Re- 
gion. The organization of an Australian Region will be completed next 
January, if not before, and so far has enrolled 23 members. The other 
130 members live in areas which have not yet been organized or are 
members-at-large. We hope that some of them will be provided with a 
home before many months have passed. Both a Western European and 
an Indian Region are about to be formed. 

Members of the Society and their colleagues will want to start plan- 
ning for the Second International Biometric Conference. This is planned 
for Geneva, Switzerland, late next summer, probably in the period from 
August 30 to September 2. The University of Geneva will be our host 
and Professor Arthur Linder assures us that accommodations will be 
available for every income level. A special organizing committee for 
the conference will be formed in the near future. Travel agencies urge 
that you reserve steamship accommodations at once if there is any 
chance of your being able to attend. Steamship reservations for early 
next summer are scarce. Making a reservation now will not obligate 
you in any way and later it may be extremely difficult or impossible to 
obtain the type of accommodation you want. - 

Our conference precedes the next session of the Toternationsl 
Statistical Institute, which will be held by invitation of the Swiss govern- 
ment at Berne or Luzerne from September 4 to 10. By invitation, the 
Biometric Society has applied for affiliation with the International 
Statistical Institute. This will facilitate the coordination of our pro- 
grams and minimize the risk of overlooking any vital aspect of our field. 
Biometry was accepted provisionally at a meeting in UNESCO House 
in Paris on October 4 and 5 as a section of the International Union of 
the Biological Sciences. In view of this action the road is now open for 
UNESCO support for our conference in Geneva. 

In carrying out the requirements of the constitution, the Council 
has made several decisions as to procedure. To insure their ready 
availability, these rules have been formulated in a series of ‘Council 
By-Laws” which are now in process of final revision and adoption. 
They concern finances, the relation of the Regions to the Society, regional 
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officers and dues, nominations for Council, Council elections and inter- 
national conferences. It is hoped in this way to provide a permanent 
record for the guidance of all officers of the Society rather than to leave 
these details to chance. 

The application of the Eastern North American Region (ENAR) for 
membership in the Division of Biology and Agriculture of the National 
Research Council has been accepted. William G. Cochran has been 
named to represent the Society in the Division. 

In addition to the regional meeting of ENAR with the American 
Public Health Association in Boston in November, another joint session 
was arranged with the American Association of Economic Entomologists 
for the evening of December 13 at their annual meeting in New York. 
The session consisted of an informal biometric “clinic’’ on entomological 
problems submitted by the entomologists. A panel of statistical and 
biometrical “experts” had the job of answering these questions to the 
satisfaction of the entomologists. The most extended series of programs 
of ENAR were those arranged jointly with the Biometrics Section of the 
American Statistical Association at the annual meeting in Cleveland in 
late December. They will be reported in the next issue of BIoMETRICs. 

The By-Laws adopted by the British Region on March 31 and since 
approved by the Council are as follows: 


“As a division of the Biometric Society, the British Region is 
governed by the Constitution of the Society and the following 
Regional Rules. 

1. The region shall endeavour to promote quantitative biology 
in all its aspects. 

2. Membership of the Region is open-to residents in the British 
Isles, and to British scientists resident in other countries. 

3. The business of the Region shall be conducted by a Com- 
mittee consisting of the Vice-President for the Region, the Secretary, 
Treasurer, and six Ordinary Members together with any Members 
of the General Council of the Society who belong to the Region. 
Four shall form a quorum. 

4. The Vice-President, Secretary and Treasurer shall retire 
annually but shall be eligible for re-election. Two ordinary members 
shall retire each year by seniority in order of election and shall not 
be eligible for re-election to ordinary membership of the Committee 
until a year has elapsed. The Committee shall have power to fill 
casual vacancies in their number, subject to the approval of the 
next Annual Meeting. Any member so appointed to a casual va- 
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cancy shall hold office only for the unexpired term of his predecessor; 
but shall be eligible for immediate re-election. 

5. There shall be an Annual Meeting of the Region during the 
months of March and April, and such other meetings as the Com- 
mittee may decide. At least ten days notice shall be given of all 
meetings, other than the Annual Meeting or Special Meetings. 

6. At least six weeks notice shall be given of the Annual Meet- 
ing. At the same time there shall be sent to each member a list of 
the Regional Officers and Members of Committee indicating those 
due to retire, and inviting nominations to fill the vacancies. Nom- 
inations must be received by the Secretary at least four weeks before 
the date of the Annual meeting, and must be signed by at least two 
members of the Region and must be accompanied by a declaration, 
signed by the nominee, that he is willing to serve if elected. 

7. No ballot shall be taken if the nominations are insufficient 
or just sufficient to fill the vacancies, and in the former case the 
Committee shall make such additional nominations as are required 
to fill the remaining vacancies. All the persons so nominated shall 
be deemed elected, pending action by the Council. 

8. If there should be more than one nomination for any office, 
or more nominations for ordinary membership of the Committee 
than there are vacancies, a postal ballot shall be held. At least 
two weeks before the Annual Meeting each member shall be sent a 
ballot paper containing a list of those vacancies for which a ballot 
is to be held, together with the names of the persons validly nomi- 
nated to fill them. The ballot papers shall be returned not later 
than the commencement of the Annual Meeting, at which a count 
shall be made and the result declared. 

9. A Special Meeting shall be convened within eight weeks of the 
receipt by the Secretary of a request signed by not less than twelve 
members. A notice stating the purpose of the meeting shall be sent 
to each member not less than two weeks before it is to be held. 

10. The Committee shall receive nominations of candidates for 
membership of the region, and shall forward those deemed appropri- 
ate to the General Council. 

11. A member may be expelled from the Region only on the 
proposal of the Committee and by a majority of two-thirds of those 
present and voting at an Annual Meeting, after at least two weeks 
notice has been given. 

12. The subscription of each member shall be £1, which becomes 
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due on election to the Region, and subsequently on Ist. February 
each year. From the fund so constituted the contributions to the 
general funds of the Society shall be paid and secretarial expenses 
and other such costs as the proper conduct of the Society demands 
may be defrayed. 

13. The membership of any member who is three or more years 
in arrears with his subscription may be terminated on a vote of the 
Committee. No application to rejoin the Society shall be enter- 
tained until the unpaid subscriptions have been discharged. 

14. A statement of the Region’s finances shall be presented to 
the Annual Meeting by the Treasurer, after the accounts have been 
audited by an Hon. Auditor appointed at the previous Annual 
Meeting for the purpose. The Hon. Auditor shall not be an 
Officer or Member of the Committee. 

15. These rules may be amended at an Annual Meeting or a Spe- 
cial Meeting convened for the purpose, after six weeks notice has 
been given, by a majority of two-thirds of those voting, a postal 
ballot being taken of the same kind as for the election of Officers.”’ 


It is with great sorrow that we report the loss of one of our most 
active and able members. Professor John H. Watkins of Yale Uni- 
versity died suddenly on September 25. As Secretary-Treasurer of 
ENAR and at the same time of the Biometrics Section of the American 
Statistical Association, he has done invaluable service in coordinating 
the activities of these two organizations. His work on hospitalization 
statistics for the Army in World War II and on hospital and public 
health statistics in New Haven before and after the war was outstanding. 
He will be sorely missed by his many friends and colleagues. 
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NEWS AND NOTES 


CANADA—G. C. Ashton, Assistant Professor of Nutrition, Mac- 
Donald College, Quebec, has made a few comments regarding the value 
of statistical methods for research in Nutrition. ‘‘ Modern statistics pro- 
vide a means of condensing masses of nutrition data which allows 
interpretation of same which would otherwise be impossible. One of the 
most characteristic attributes of biological organisms is their variability. 
Nutrition research is no less affected by this quality than other branches 
of biology and one of statistics’ great aids in this field of research has 
been to provide an effective means of measuring this variability. Sta- 
tistical development has aided in indicating the most suitable experi- 
mental designs with which to resolve nutritional problems. With these 
designs, normal interactions can be allowed to occur and their effect 
and extent determined. Design research has given set-ups which indicate 
the most effective use of the experimental animals thereby cutting costs 
of research in terms of time involved, feed required and monetary 
outlay, while at the same time increasing the precision of the estimates.” 
Much of Mr. Ashton’s time is taken up with guiding graduate students 
in the application of statistics to their problems. 


ENGLAND—N. T. Gridgeman, Eastham, Cheshire, writes, ‘“ Bio- 
metrics is good. I’m shocked to read that my friend D. J. Finney depre- 
cates what he calls the ‘social gossip’. News and Notes is an admirable 
feature; one’s natural curiosity about fellow scientists in other countries 
is all too seldom met. Until a man dies or unless he does something 
notable enough to engage the attention of the lay press, nothing but his 
name and address are available. This is as surely wrong as your repara- 
tion is surely right.”’ This is an encouraging note for those who assemble 
this news about Biometrics subscribers. One more objection has been 
received from a person who is concerned by the undignified ‘‘ News and 
Notes’’. Three down, but still we go... Major I. B. Perrott, Solihull, 
Birmingham, is to serve as ‘‘News Editor” for the British Region of 
The Biometric Society. Send your news to him. He has taken up recently 
a position in pure mathematics in the Department of Mathematics, 
University of Leeds. 


INDIA—Raj Chandra Bose has resigned as head of the graduate 
Department of Statistics, University of Calcutta. He has been appointed 
Professor of Mathematical Statistics, University of North Carolina, 
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beginning in the Winter of 1949. Professor Bose is an authority on the 
design of experiments and is writing a book on the combinatorial mathe- 
mathics of the subject. He served as Visiting Professor at Columbia 
University during the Fall of 1947 and was at the Institute of Statistics 
during the Winter and Spring of 1948... C. Chandra Sekar, Professor 
of Statistics at the All-India Institute of Hygiene and Public Health at 
Calcutta, who was a student at The Johns Hopkins School of Hygiene 
and Public Health last year, is remaining in this country for a second 
year as a member of the Population Division of the United Nations. 


UNITED STATES—Isidore Altman, Biostatistician, Public Health 
Service, Washington, D. C., is directing his efforts toward problems in 
medical economics, particularly the collection of information on the 
number and distribution of medical personnel and facilities and on the 
cost of medical care. He writes, “Two illustrative studies are (a) an 
analysis of the supply of physicians in the District of Columbia and of 
their patient load, and (b) a study of the medical care sought by older 
persons in the Eastern Health District of Baltimore, with particular 
interest in chronic disease and its social and economic aspects.” . . 
Huldah Bancroft, The School of Medicine, Tulane University of Louisi- 
ana, New Orleans, has charge of teaching Biostatistics for the Depart- 
ment of Tropical Medicine and Public Health. She teaches an under- 
graduate course in biostatistics which is required of all sophomores in 
the School of Medicine. Also, a course is being given for faculty mem- 
bers. Miss Bancroft serves as a consultant for the school. She has as 
her assistant Margaret Allen who was formerly at Harvard with E. B. 
Wilson and, during the war, with the Navy... Edward W. Barankin 
was promoted to Assistant Professor and Research Associate at the 
Statistical Laboratory, University of California, Berkeley. .. Goeffrey 
Beall moves from paper to glass. He was statistician at the Institute 
of Paper Chemistry, Appleton, Wisconsin. He now has a similar position 
at the Preston Laboratories, which are concerned with research on glass. 
The Preston Laboratories are located at Butler, Pennsylvania. . . Bernice 
Brown, Project Rand, Santa Monica, California, does not like a state- 
ment that several biological statisticians have gone astray into industry. 
She responds, “I object to the word ‘astray’. There are gadgets in 
industry which behave very much like pigs and rats. The opportunity 
for learning more about statistics exists here under conditions not unlike 
those of an experiment station.” Is she sold on California! Is it the 
climate?. .. W. V. Charter, Deputy Director, Medical Statistics Division, 
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Bureau of Medicine and Surgery, Navy Department, responded to an 
inquiry regarding the work of his division thus, ‘‘We have approximately 
75 people in this Division of whom 7 are professional statisticians 
comprising what we know as a Staff. The other personnel are divided 
into administration, editing and coding, tabulating, and machine opera- 
tions people. Since this is the focal point of all medical statistics in the 
Navy, we are responsible for format, organization, instructions, etc, 
concerning casualties, mortalities, and all other morbid conditions. We 
receive individual patient reports on every man who turns in, in the 
Navy, whether it be here in Washington, San Francisco or Shanghai!’’ 
He tells of a vast amount of other medical statistics data which they 
receive. The Division publishes a monthly magazine “‘Statistics of Navy 
Medicine’. The military service have compulsory reporting as well as 
having a direct knowledge at all times of their current population. . . 
W. G. Cochran of the Institute of Statistics, University of North Caro- 
lina, has accepted an appointment as Professor of Biostatistics at the 
School of Hygiene and Public Health of The Johns Hopkins Uni- 
versity. He will take up his new post the first of the year... James F. 
Crow formerly at Dartmouth College, Hanover, New Hampshire, is 
now with the Department of Genetics, The University of Wisconsin, 
Madison. He is teaching courses in genetics and doing Drosophila re- 
search... W. Edwards Deming, Division of Statistics, Department of 
the Budget, Washington, and his wife have returned from a two months’ 
sojourn to various parts of Europe. Mr. Deming attended the meeting 
of the United Nations Sub Commission on Statistical Sampling in 
Geneva, and held consultations on sampling and the control of quality 
in Rome, Milan, Paris, Amsterdam, Luden, and Den Haag. He reports 
that there is much interest and progress in statistical methodology in 
all these places in government agencies, national standardizing bodies, 
manufacturing industries, public opinion and market research organiza- 
tions... Max Halperin, graduate student with the Department of 
Mathematical Statistics, University of North Carolina, Chapel Hill, 
has joined the Project Rand group at Santa Monica, California... 
William F. Hewitt, Jr., was a physiologist-pharmacologist and literature 
scientist in the Literature Research Department, Smith, Kline and 
French Laboratories, Philadelphia. He is now an Assistant Professor of 
Physiology, School of Medicine, Howard University, Washington, D. C. 
He states, “‘In addition to teaching and conducting laboratory research, 
I am trying to establish a literature-science unit, one of the functions of 
which will be to instruct and consult in experimental design and judg- 
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ment of evidence.’’ Mr. Hewitt will welcome suggestions as to possible 
activities of an academic group of this type... Joseph L. Hodges, Jr., 
has been promoted to Instructor and Research Associate at the Sta- 
tistical Laboratory, University of California, Berkeley... Carol M. 
Jeager is doing statistical analysis work with the Bureau of Ships, 
Navy Department, Washington. Miss Jeager was formerly with the 
Department of Agriculture’s Northern Regional Research Laboratory, 
Peoria, Illinois... E. L. Leclerg has left the Bureau of the Budget to 
join the Agricultural Research Administration in Washington as Re- 
search Coordinator responsible for the field of crop production. . . 
Douglas E. Scates has taken a new post with The American Council on 
Education, in charge of their research in scientific personnel for contracts 
sponsored by the Office of Naval Research. He was with the Department 
of Education, Duke University, Durham, North Carolina... D. M. 
Seath recently left the Louisiana Agricultural Experiment Station, Baton 
Rouge. He is now a Professor of Dairy Husbandry, University of 
Kentucky, Lexington, and is in charge of the Dairy Section. .. Arthur 
G. Steinberg who was with the Fek Research Institute, Antioch College, 
Yellow Springs, Ohio, is now a member of the staff of the Division of 
Biometry and Medical Statistics, Mayo Clinic, Rochester, Minnesota. . . 
B. L. Wade is now head of the Department of Horticulture, University 
of Illinois, Urbana. He expects to put considerable emphasis on the 
development of graduate work in horticulture. Mr. Wade served for 
several years as Director of the Regional Horticulture Laboratory, 
Charleston, South Carolina. To him is due considerable credit for 
promoting cooperative research in the Southeast... J. Yerushalmy is 
now Professor of Biostatistics, School of Public Health, University of 
California, Berkeley. He is continuing with- the studies on statistical 
problems in assessing Methods of Medical Diagnosis. Mr. Yerushalmy 
was formerly with the Tuberculosis Control Division, U. 8. Public 
Health Service, Bethesda, Maryland. There is some comfort to know 
that others have the problem of staffing their department and planning 
the teaching and research program. 
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